AITopics | self-knowledge distillation

Collaborating Authors

self-knowledge distillation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval

Song, Jonghyun, Lee, Youngjune, Cho, Gyu-Hwung, Song, Ilhyeon, Kim, Saehun, Jo, Yohan

arXiv.org Artificial IntelligenceAug-26-2025

Vision-Language Pretrained (VLP) models have achieved impressive performance on multimodal tasks, including text-image retrieval, based on dense representations. Meanwhile, Learned Sparse Retrieval (LSR) has gained traction in text-only settings due to its interpretability and efficiency with fast term-based lookup via inverted indexes. Inspired by these advantages, recent work has extended LSR to the multimodal domain. However, these methods often rely on computationally expensive contrastive pre-training, or distillation from a frozen dense model, which limits the potential for mutual enhancement. To address these limitations, we propose a simple yet effective framework that enables bi-directional learning between dense and sparse representations through Self-Knowledge Distillation. This bi-directional learning is achieved using an integrated similarity score-a weighted sum of dense and sparse similarities-which serves as a shared teacher signal for both representations. To ensure efficiency, we fine-tune the final layer of the dense encoder and the sparse projection head, enabling easy adaptation of any existing VLP model. Experiments on MSCOCO and Flickr30k demonstrate that our sparse retriever not only outperforms existing sparse baselines, but also achieves performance comparable to-or even surpassing-its dense counterparts, while retaining the benefits of sparse models.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2508.16707

Country: Asia > South Korea (0.17)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Efficient Lung Ultrasound Severity Scoring Using Dedicated Feature Extractor

Guo, Jiaqi, Wu, Yunnan, Kaimakamis, Evangelos, Petmezas, Georgios, Papageorgiou, Vasileios E., Maglaveras, Nicos, Katsaggelos, Aggelos K.

arXiv.org Artificial IntelligenceJan-21-2025

With the advent of the COVID-19 pandemic, ultrasound imaging has emerged as a promising technique for COVID-19 detection, due to its non-invasive nature, affordability, and portability. In response, researchers have focused on developing AI-based scoring systems to provide real-time diagnostic support. However, the limited size and lack of proper annotation in publicly available ultrasound datasets pose significant challenges for training a robust AI model. This paper proposes MeDiVLAD, a novel pipeline to address the above issue for multi-level lung-ultrasound (LUS) severity scoring. In particular, we leverage self-knowledge distillation to pretrain a vision transformer (ViT) without label and aggregate frame-level features via dual-level VLAD aggregation. We show that with minimal finetuning, MeDiVLAD outperforms conventional fully-supervised methods in both frame- and video-level scoring, while offering classification reasoning with exceptional quality. This superior performance enables key applications such as the automatic identification of critical lung pathology areas and provides a robust solution for broader medical video classification tasks.

aggregation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.12524

Country:

Europe > Greece > Central Macedonia > Thessaloniki (0.05)
North America > United States > Illinois (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.55)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

Generative Dataset Distillation Based on Self-knowledge Distillation

Li, Longzhen, Li, Guang, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki

arXiv.org Artificial IntelligenceJan-7-2025

Generative dataset distillation aims to condense the information from large-scale datasets into a generative model rather than a static Dataset distillation is an effective technique for reducing the cost dataset [16, 17]. Unlike traditional dataset distillation methods, and complexity of model training while maintaining performance by which produce a smaller fixed dataset, generative dataset distillation compressing large datasets into smaller, more efficient versions. In trains a model capable of generating effective synthetic data on this paper, we present a novel generative dataset distillation method the fly [18]. This approach has been shown to offer better crossarchitecture that can improve the accuracy of aligning prediction logits. Our approach performance compared to traditional methods, while integrates self-knowledge distillation to achieve more precise also providing greater flexibility in the data it generates. The generative distribution matching between the synthetic and original data, dataset distillation process typically consists of two steps.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.04202

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Kim, Eungbeom, Kim, Hantae, Lee, Kyogu

arXiv.org Machine LearningJun-12-2024

Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement.

alignment, distillation, knowledge distillation, (13 more...)

arXiv.org Machine Learning

2406.07909

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification

Xu, Bo, Zheng, Hao, Hu, Zhigang, Yang, Liu, Zheng, Meiguang

arXiv.org Artificial IntelligenceNov-26-2023

In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specifically, through exploring the effect of distillation weight on the process of distillation, we are inspired to adopt the double reverse thought to implement an effective regularization network by combining offline and online distillation in a complementary way. Then, the Adaptive Weight Assignment (AWA) module is designed to adaptively assign two reverse-changing weights based on the network performance, allowing the student network to better benefit from both teachers. The experimental results on OpenSARShip and FUSAR-Ship demonstrate that DRRNet-SKD exhibits remarkable performance improvement on classical CNNs, outperforming state-of-the-art self-knowledge distillation methods.

classification, distillation, ship classification, (13 more...)

arXiv.org Artificial Intelligence

2311.15231

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Sports > Football (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Promoting Generalized Cross-lingual Question Answering in Few-resource Scenarios via Self-knowledge Distillation

Carrino, Casimiro Pio, Escolano, Carlos, Fonollosa, José A. R.

arXiv.org Artificial IntelligenceSep-29-2023

Despite substantial progress in multilingual extractive Question Answering (QA), models with high and uniformly distributed performance across languages remain challenging, especially for languages with limited resources. We study cross-lingual transfer mainly focusing on the Generalized Cross-Lingual Transfer (G-XLT) task, where the question language differs from the context language - a challenge that has received limited attention thus far. Our approach seeks to enhance cross-lingual QA transfer using a high-performing multilingual model trained on a large-scale dataset, complemented by a few thousand aligned QA examples across languages. Our proposed strategy combines cross-lingual sampling and advanced self-distillation training in generations to tackle the previous challenge. Notably, we introduce the novel mAP@k coefficients to fine-tune self-knowledge distillation loss, dynamically regulating the teacher's model knowledge to perform a balanced and effective knowledge transfer. We extensively evaluate our approach to assess XLT and G-XLT capabilities in extractive QA. Results reveal that our self-knowledge distillation approach outperforms standard cross-entropy fine-tuning by a significant margin. Importantly, when compared to a strong baseline that leverages a sizeable volume of machine-translated data, our approach shows competitive results despite the considerable challenge of operating within resource-constrained settings, even in zero-shot scenarios. Beyond performance improvements, we offer valuable insights through comprehensive analyses and an ablation study, further substantiating the benefits and constraints of our approach. In essence, we propose a practical solution to improve cross-lingual QA transfer by leveraging a few data resources in an efficient way.

few-resource scenario, promoting generalized, self-knowledge distillation

arXiv.org Artificial Intelligence

2309.17134

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

Add feedback

Siamese Sleep Transformer For Robust Sleep Stage Scoring With Self-knowledge Distillation and Selective Batch Sampling

Kwak, Heon-Gyu, Kweon, Young-Seok, Shin, Gi-Hwan

arXiv.org Artificial IntelligenceDec-11-2022

In this paper, we propose a Siamese sleep transformer (SST) that effectively extracts features from single-channel raw electroencephalogram signals for robust sleep stage scoring. Despite the significant advances in sleep stage scoring in the last few years, most of them mainly focused on the increment of model performance. However, other problems still exist: the bias of labels in datasets and the instability of model performance by repetitive training. To alleviate these problems, we propose the SST, a novel sleep stage scoring model with a selective batch sampling strategy and self-knowledge distillation. To evaluate how robust the model was to the bias of labels, we used different datasets for training and testing: the sleep heart health study and the Sleep-EDF datasets. In this condition, the SST showed competitive performance in sleep stage scoring. In addition, we demonstrated the effectiveness of the selective batch sampling strategy with a reduction of the standard deviation of performance by repetitive training. These results could show that SST extracted effective learning features against the bias of labels in datasets, and the selective batch sampling strategy worked for the model robustness in training.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.13919

Country: Asia > South Korea > Seoul > Seoul (0.05)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Lee, Dongkyu, Cheung, Ka Chun, Zhang, Nevin L.

arXiv.org Artificial IntelligenceOct-22-2022

Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.13459

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.86)

Add feedback

Self-Knowledge Distillation in Natural Language Processing

Hahn, Sangchul, Choi, Heeyoul

arXiv.org Machine LearningAug-2-2019

Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high performance can be explained by efficient knowledge representation of deep learning models. While many methods have been proposed to learn more efficient representation, knowledge distillation from pretrained deep networks suggest that we can use more information from the soft target probability to train other neural networks. In this paper, we propose a new knowledge distillation method self-knowledge distillation, based on the soft target probabilities of the training model itself, where multimode information is distilled from the word embedding space right below the softmax layer. Due to the time complexity, our method approximates the soft target probabilities. In experiments, we applied the proposed method to two different and fundamental NLP tasks: language model and neural machine translation. The experiment results show that our proposed method improves performance on the tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1908.01851

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback